{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy import sparse as spsp\n",
    "import pandas as pd\n",
    "import dgl\n",
    "import torch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Synthesize data\n",
    "\n",
    "Please use the 'synthetic_data.ipynb' notebook to synthesize data in this tutorial."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Construct a homogeneous graph\n",
    "\n",
    "'edges.csv' stores a graph with multiple edge types. Although the graph has multiple edge types and node types, we can still construct it as a homogeneous graph. In this case, we create `DGLGraph` for this graph and stores node types and edge types as node data and edge data. In this graph, each node has two features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "node_data = pd.read_csv('node_data.csv', header=None)\n",
    "edges = pd.read_csv('edges.csv', header=None)\n",
    "\n",
    "num_edges = edges.shape[0]\n",
    "spm = spsp.coo_matrix((np.ones(num_edges), (edges[0], edges[1])))\n",
    "g = dgl.DGLGraph(spm, readonly=True)\n",
    "ndata1 = np.expand_dims(np.array(node_data[1]), 1)\n",
    "ndata2 = np.expand_dims(np.array(node_data[2]), 1)\n",
    "ndata = np.concatenate([ndata1, ndata2], 1)\n",
    "g.ndata['feats'] = torch.tensor(ndata)\n",
    "g.edata['rel'] = torch.tensor(np.array(edges[2]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Construct a heterogeneous graph with one node type\n",
    "\n",
    "For the graph above, we can construct a heterogeneous graph with one node type. We first need to create an adjacency matrix for each edge type and call `heterograph` to create `DGLHeteroGraph`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "node_data = pd.read_csv('node_data.csv', header=None)\n",
    "edges = pd.read_csv('edges.csv', header=None)\n",
    "\n",
    "rel = np.array(edges[2])\n",
    "spm_dict = {}\n",
    "src = edges[0][rel==0]\n",
    "dst = edges[1][rel==0]\n",
    "spm_dict['a', '0', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
    "\n",
    "src = edges[0][rel==1]\n",
    "dst = edges[1][rel==1]\n",
    "spm_dict['a', '1', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
    "\n",
    "src = edges[0][rel==2]\n",
    "dst = edges[1][rel==2]\n",
    "spm_dict['a', '2', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
    "\n",
    "g = dgl.heterograph(spm_dict)\n",
    "\n",
    "ndata1 = np.expand_dims(np.array(node_data[1]), 1)\n",
    "ndata2 = np.expand_dims(np.array(node_data[2]), 1)\n",
    "ndata = np.concatenate([ndata1, ndata2], 1)\n",
    "g.ndata['feats'] = torch.tensor(ndata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Construct a heterogeneous graph with multiple node types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "'edges1.csv' stores a graph with 3 edge types and 2 node types. We can construct a heterogeneous graph in a similar way as above. We first need to create an adjacency matrix for each edge type and call `heterograph` to create `DGLHeteroGraph`. Note: The third edge type connects nodes of type 'a' and 'b'.\n",
    "\n",
    "After constructing the heterogeneous graph, we need to assign node data to each node type. We need to assign data to the nodes of type 'a' and 'b' separately."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "edges = pd.read_csv('edges1.csv', header=None)\n",
    "node_data = pd.read_csv('node_data.csv', header=None)\n",
    "node_data1 = pd.read_csv('node_data1.csv', header=None)\n",
    "\n",
    "rel = np.array(edges[2])\n",
    "spm_dict = {}\n",
    "src = edges[0][rel==0]\n",
    "dst = edges[1][rel==0]\n",
    "spm_dict['a', '0', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
    "\n",
    "src = edges[0][rel==1]\n",
    "dst = edges[1][rel==1]\n",
    "spm_dict['a', '1', 'a'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
    "\n",
    "src = edges[0][rel==2]\n",
    "dst = edges[1][rel==2]\n",
    "spm_dict['a', '2', 'b'] = spsp.coo_matrix((np.ones(len(src)), (src, dst)))\n",
    "\n",
    "g = dgl.heterograph(spm_dict)\n",
    "\n",
    "ndata = np.concatenate([np.expand_dims(np.array(node_data[1]), 1),\n",
    "                        np.expand_dims(np.array(node_data[2]), 1)], 1)\n",
    "ndata1 = np.concatenate([np.expand_dims(np.array(node_data1[1]), 1),\n",
    "                         np.expand_dims(np.array(node_data1[2]), 1),\n",
    "                         np.expand_dims(np.array(node_data1[3]), 1)], 1)\n",
    "g.nodes['a'].data['feats'] = torch.tensor(ndata)\n",
    "g.nodes['b'].data['feats'] = torch.tensor(ndata1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Construct a heterogeneous graph with non-contiguous node Ids"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
